Thirty Days of Metal — Day 28: Skinning
This series of posts is my attempt to present the Metal graphics programming framework in small, bite-sized chunks for Swift app developers who haven’t done GPU programming before.
If you want to work through this series in order, start here. To download the sample code for this article, go here.
We have seen some simple examples of time-based animation of object transformations, but this time, we will look at the powerful combination of animation and hierarchy.
The purpose of the Model I/O framework is to provide a unified interface to a variety of 3D asset formats. For this reason, it provides abstractions for a number of common concepts, such as meshes, materials, and animations. We have already gotten acquainted with how to load meshes and materials with Model I/O and render them with Metal.
Another foundational concept implemented in Model I/O is the skeleton. A skeleton consists of a set of joints, which are arranged in a hierarchy, just like the node hierarchies we have already worked with. By associating mesh vertices with one or more joints, we can realistically animate the mesh just by animating the transformations of the joints.
In this article, we will discuss how to load a model containing a skeleton and an animation that changes the transforms of the skeleton’s joints over time, a process called skeletal animation.
Rigging and Skinning
To perform skeletal animation, we first need a model that has been appropriately rigged and skinned.
Rigging is the process of creating a skeleton for the mesh. During rigging, the artist selects the set of joints that will allow them the desired amount of control over the movement of the various parts of the model. Many 3D modeling applications include pre-built “rigs” that match common scenarios like bipedal humanoids or quadrupedal creatures.
The figure below provides two different views of the same skeleton. On the left side, the bones’ physical arrangement is shown, overlaid on a bipedal figure. On the right side, the joints are arranged hierarchically, which emphasizes the fact that the “Hips” joint is the root of the hierarchy.
Joints often correspond to common body parts like feet, hips, and spines, but they don’t have to. Skeletons are usually simplified relative to the real-world objects they represent, both for performance reasons and to make animation easier. On the other hand, features like ponytails which do not contain bones in the real world might be rigged with joints in a skeleton to allow artistic control over hair movement during animation. The general rule is that if part of a model needs to be moved independently, it should have one or more associated joints.
After rigging, the artist performs a task called skinning. During skinning, the artist associates each vertex in the mesh with one or more joints. Each joint — vertex pair has a weight that determines the joint’s influence over the vertex. When the joint is animated, the vertex moves in proportion to this weight.
The figure below is a representation of joint-vertex associations. Each dot is a vertex, with the dot’s color illustrating the joint or joints it is influenced by. For example, a fully red dot is only influenced by the upper arm joint (shown in red), while dots closer to the “elbow” are more yellowish in hue to show that they are also partially influenced by the lower arm joint.
The purpose of allowing more than one joint to influence a vertex is so that vertices can move more naturally. If each vertex moved relative to a single joint, the overall motion would appear rigid and not at all lifelike. With the weights allocated as shown above, the arm can flex and bend much more realistically. Done well, skinning can enable much more expressive and flexible animation than the rigid body transforms we used previously.
Skeletal Animation
The kinds of animations we have seen so far have applied to a single object’s transformation. In order to fully unlock the power of animation, we now need to understand hierarchical animation. When we say hierarchical animation, we mean motion of a node relative to its ancestors in a transformation hierarchy, rather than in world space.
One kind of hierarchical animation we can achieve with a rigged mesh is skeletal animation. The hierarchy of joints in a mesh’s skeleton gives us the ability to express relative position and orientation of different parts of a mesh. By animating these positions and orientations over time, we can create graceful articulated figures.
Now that we understand the fundamentals of rigging, skinning, and joint hierarchies, let’s look at how to implement skeletal animation in Metal. We will begin with how skeletons are represented in Model I/O.
Skeletons in Model I/O
Skeletons in Model I/O are represented by the MDLSkeleton class. This type does not contain MDLObjects representing the joints, as you might expect. Instead, it contains a list of joint paths, which are slash-separated strings of names. This list defines the skeletal hierarchy very compactly, in a way that is reminiscent of nested file paths.
For the skeleton in the figures above, here is a portion of the joint paths list:
/Hips
/Hips/Abdomen
/Hips/Abdomen/Torso
/Hips/UpperLegLeft
/Hips/UpperLegLeft/LowerLegLeft
...Here is a simplified declaration for the MDLSkeleton class:
class MDLSkeleton : MDLObject {
var jointPaths: [String]
var jointBindTransforms: MDLMatrix4x4Array
var jointRestTransforms: MDLMatrix4x4Array
public init(name: String, jointPaths: [String])
}Along with the joint paths, the skeleton has two lists of transforms: the bind transforms and the rest transforms.
A bind transform moves a joint from its local coordinate space into the coordinate space of its parent joint. When every joint has its bind transform applied, the resulting arrangement is called the bind pose or reference pose.
From the bind pose, an animator creates an animation by repositioning the joints in a sequence of desired poses, which are stored as a sequence of keyframes. Each keyframe stores a snapshot of the transformation of each animated node at the corresponding key time. At runtime these transformations are interpolated over time to play back the animation.
In contrast, a rest transform is the transform applied to a joint that is not being animated. Since some animations do not affect every joint, the rest transform is the transform applied by default after the bind transform.
The MDLMatrix4x4Array type abstracts access to an array of 4x4 matrices. We can request matrices whose elements are floats or doubles from such an object. Since we use floats in our shaders, we will use the float4x4Array property of these objects below, which is typed as [float4x4], a good match for shader code.
To keep track of the transformation of each joint, we will need a data type that holds a local transformation (which might be animated) and can provide a concatenated matrix representing the joint-space-to-model-space transformation.
A Skeleton Class
It turns out we already have a type that holds a transform and can concatenate transforms: the Node class. Although we don’t need all of its features, we may as well reuse it rather than creating a special type just for joints. We will create a hierarchy of nodes for each skeleton in the Model I/O asset, instantiating one node per joint (I will refer to such nodes as “joint nodes” below).
The joint nodes will not belong to the scene’s node hierarchy. Instead, we need a separate object to hold and manage the joint nodes. This will be our Skeleton class. In addition to holding the joint nodes themselves, the skeleton will hold the bind transforms and the rest transforms of each joint.
The data members of the Skeleton class look like this:
let name: String
let jointPaths: [String]
let inverseBindTransforms: [float4x4]
let restTransforms: [float4x4]var jointCount: Int {
return joints.count
}private var joints = [Node]()
We copy the list of joint paths from the MDLSkeleton so we can match up joint nodes to their respective paths during animation.
To construct a Skeleton from an MDLSkeleton, we first copy the name, joint paths, bind transforms, and rest transforms. In this process, we invert the bind transforms. Then, we generate the joint node hierarchy and assign each joint its resting transform.
init(_ mdlSkeleton: MDLSkeleton) {
name = mdlSkeleton.name
jointPaths = mdlSkeleton.jointPaths
inverseBindTransforms = mdlSkeleton.jointBindTransforms.float4x4Array.map { $0.inverse }
restTransforms = mdlSkeleton.jointRestTransforms.float4x4Array
joints = makeSkeletonHierarchy(from: jointPaths) for (jointIndex, joint) in zip(0..., joints) {
joint.transform = restTransforms[jointIndex]
}
}
The makeSkeletonHierarchy method is responsible for building the joint node hierarchy. We can do this with a two-pass algorithm. In the first pass, we instantiate nodes and build a dictionary that maps from joint paths to joint nodes:
func makeSkeletonHierarchy(from jointPaths: [String]) -> [Node] {
var joints = [Node]()
var jointsForPaths = [String : Node]()
for jointPath in jointPaths {
let joint = Node()
joint.name = jointPath
jointsForPaths[jointPath] = joint
joints.append(joint)
}In the second pass, we build the hierarchy by iterating over the nodes, finding the parent of each node, and adding the node to its parent. We also reset each node’s name to its “unqualified” name rather than its full path. For example, a joint with the path Root/Body/Hips/Abdomen/Torso/Neck/Head/Ear1_R would wind up with the name Ear1_R.
for jointPath in jointPaths {
let child = jointsForPaths[jointPath]!
let parentPath = (jointPath as NSString).deletingLastPathComponent as String
let parent = jointsForPaths[parentPath]
child.name = (jointPath as NSString).lastPathComponent as String
parent?.addChildNode(child)
} return joints
}
With our skeleton assembled, let’s look at how to render a skinned mesh with Metal.
Vertex Skinning in Metal
Like many of the techniques we have implemented, vertex skinning requires relatively little shader code. Most of the complexity arises from keeping track of the various transformation matrices and preparing them for use within the vertex function. In this section, we will first look at how to gather the (potentially animated) transformations of the joints and then consider how to use them to skin our vertices in the vertex function.
Just as we have been doing with our other constant data, we will write the joint transformation matrices into a buffer so they can be accessed in the vertex function. We do this by iterating over the joint nodes in the skeleton and asking each node for its world transform, then writing the result into the constant buffer.
To apply the joint transforms in the vertex function, we need two additional pieces of data for each vertex: the joint indices and the joint weights. The joint indices tell us which joints should influence the vertex, and the weights tell us how much influence each joint has.
Since we need this data on a per-vertex basis, we add two vertex attributes — called jointWeights and jointIndices—to our vertex descriptor. jointIndices is a ushort4, a four-element vector of unsigned 16-bit integers, and jointWeights is a float4. This implies that up to four joints can influence each vertex, which is normally a good balance between performance and expressiveness.
The expanded vertex structure in our shader code looks like this:
struct SkinnedVertexIn {
float3 position [[attribute(0)]];
float3 normal [[attribute(1)]];
float2 texCoords [[attribute(2)]];
ushort4 jointIndices [[attribute(3)]];
float4 jointWeights [[attribute(4)]];
};We send our joint transforms into the vertex function by adding a new buffer parameter of type constant float4x4 *:
vertex VertexOut skinned_vertex_main(
SkinnedVertexIn in [[stage_in]],
...
constant float4x4 *jointMatrices [[buffer(3)]],
...)
{Vertex skinning affects the model-space position of the vertex, so it is applied before the model matrix. To determine the “skinning matrix” that takes us from model space to skinned model space, we index into the joint transform array using the current vertex’s joint indices, then add the joint matrices together, weighted by the current vertex’s joint weights:
float4 modelPosition = float4(in.position, 1.0);
float4 modelNormal = float4(in.normal, 0.0);
float4x4 skinningMatrix =
in.jointWeights[0] * jointMatrices[in.jointIndices[0]] +
in.jointWeights[1] * jointMatrices[in.jointIndices[1]] +
in.jointWeights[2] * jointMatrices[in.jointIndices[2]] +
in.jointWeights[3] * jointMatrices[in.jointIndices[3]];
modelPosition = skinningMatrix * modelPosition;
modelNormal = skinningMatrix * modelNormal;We get the skinned vertex position by multiplying the skinning matrix by the vertex’s model-space position, just as we apply any other transformation. The rest of the vertex function proceeds as normal.
There are no changes needed in the fragment function.
Since we now might be drawing a mix of objects that are skinned and objects that are not skinned, we need to create two render pipeline states (with different vertex descriptors and different vertex functions). We can then select between them at render time.
At this point, we have written all of the code necessary to load a skeleton and apply joint transformations to achieve vertex skinning. Here is the sample app showing a skinned, static 3D model:
A skinned mesh isn’t much use without animation, though, so let’s look at how to load and apply animations.
Skeletal Animation in Model I/O
Model I/O represents skeleton animations with the MDLPackedJointAnimation type. A joint animation consists of timed arrays of transform components (translations, rotations, and scales). These components can be sampled over time and composed to determine the animated transformations of the targeted joints. The “packed” part of the name indicates that the data for each value of each component are stored adjacently in memory (in contrast to sparse or strided storage).
Here is the interface for the MDLPackedJointAnimation class (simplified):
class MDLPackedJointAnimation : MDLObject, MDLJointAnimation {
var jointPaths: [String]
var translations: MDLAnimatedVector3Array
var rotations: MDLAnimatedQuaternionArray
var scales: MDLAnimatedVector3Array
}The jointPaths property contains the names of the joints affected by the animation; not all joints in a skeleton need to be affected by every animation. The remaining properties are animated arrays containing transformation data; these arrays can be used to determine the translation, rotation, and scale of a joint at a particular moment in time.
We will write a simple wrapper type that makes working with Model I/O joint animations a little easier: the JointAnimation class. This class will hold the animation data, provide animation timing information, and compute lists of joint transformations for us.
Since Model I/O joint animations do not provide useful information like the start time or duration of the animation, we can add a couple of extensions that make this easier:
extension MDLPackedJointAnimation {
var minimumTime: TimeInterval {
return [translations, rotations, scales]
.reduce(TimeInterval.greatestFiniteMagnitude) { return min($0, $1.minimumTime) }
} var maximumTime: TimeInterval {
return [translations, rotations, scales]
.reduce(-TimeInterval.greatestFiniteMagnitude) { return max($0, $1.maximumTime) }
}
}
We now begin to specify our own JointAnimation class, beginning with the public properties:
class JointAnimation {
let name: String
let jointPaths: [String]
let startTime: TimeInterval
let duration: TimeInterval
let translations: MDLAnimatedVector3Array
let rotations: MDLAnimatedQuaternionArray
let scales: MDLAnimatedVector3Array
//...
}Initializing an animation just consists of copying the relevant properties, using the extensions above for the timing details:
init(_ animation: MDLPackedJointAnimation) {
name = animation.name
jointPaths = animation.jointPaths
translations = animation.translations
rotations = animation.rotations
scales = animation.scales startTime = animation.minimumTime
duration = animation.maximumTime - startTime
}
To produce the set of composed, animated transforms for a skeleton at a particular time, we implement the jointTransforms(at:) method:
func jointTransforms(at time: TimeInterval) -> [float4x4] {
let translationsAtTime = translations.float3Array(atTime: time)
let rotationsAtTime = rotations.floatQuaternionArray(atTime: time)
let scalesAtTime = scales.float3Array(atTime: time)
return zip(translationsAtTime, zip(rotationsAtTime, scalesAtTime)).map {
let (translation, (orientation, scale)) = $0
return float4x4(translation: translation, orientation: orientation, scale: scale)
}
}This completes the JointAnimation class definition.
Animations don’t do anything on their own; they have to be bound to a node. Model I/O uses the MDLAnimationBindComponent component to associate animations with nodes.
We can find the animation bind components associated with a Model I/O object during loading. We will write a small utility extension to retrieve the animation binding if present:
extension MDLObject {
var animationBind: MDLAnimationBindComponent? {
return components.filter({
$0 is MDLAnimationBindComponent
}).first as? MDLAnimationBindComponent
}
}What is inside an animation bind component? Here is a simplified look at the MDLAnimationBindComponent class:
class MDLAnimationBindComponent : NSObject, MDLComponent {
var jointAnimation: MDLJointAnimation?
var skeleton: MDLSkeleton?
var geometryBindTransform: matrix_double4x4
}An animation bind component holds a reference to a joint animation (which is a protocol to which MDLPackedJointAnimation conforms). It also holds a reference to a skeleton, which allows us to match up the joint names in the animation with their corresponding joint nodes. Finally, it holds a “geometry bind transform”, which is a matrix that transforms skinned vertices into joint space. It is often the identity matrix—in fact, we assume it always is—but we mention it here for completeness.
When loading a Model I/O asset, we check each object for an animation binding and maintain a list of animations, so we can apply them later. We also ensure each animated node has a reference to its skeleton.
if let animationBinding = mdlObject.animationBind {
if let mdlAnimation = animationBinding.jointAnimation as? MDLPackedJointAnimation {
let animation = JointAnimation(mdlAnimation)
animations.append((animation, node))
} if let mdlSkeleton = animationBinding.skeleton {
node.skinner = Skinner(
skeletonForMDLSkeleton(mdlSkeleton),
float4x4(animationBinding.geometryBindTransform))
}
}
Animation Playback
To play back an animation, we first need a notion of time. We keep track of the current time in a member variable and increment it by the frame duration each time we draw.
Once we have a global timeline established, we need to convert it into the “local time” of the animation. To find the local time, we first subtract the animation’s start time from the global time. The local time is the remainder of this result divided by the animation’s duration. Using the remainder causes the animation to loop as time progresses.
We can add a small method to our Node class to start playing an animation:
func runAnimation(_ animation: JointAnimation) {
self.animation = animation
}We also add a method to apply the current animation, if any, to the node’s skeleton:
func update(at time: TimeInterval) {
if let animation = animation, let skinner = skinner {
let localTime = max(0, time - animation.startTime)
let loopTime = fmod(localTime, animation.duration)
skinner.skeleton.apply(animation: animation, at: loopTime)
}
}Here, we first compute the animation’s local time, then tell our skinner’s skeleton to apply the animation at that time.
To apply a skeletal animation, we supply the time to the animation’s jointTransforms(at:) method, which returns the current transforms of all animated joint nodes.
We then need to apply the animated transforms to the animated joint nodes and the rest transforms to the non-animated joint nodes. Here is the complete implementation of Skeleton’s apply(animation:at:) method:
func apply(animation: JointAnimation, at time: TimeInterval) {
let animatedTransforms = animation.jointTransforms(at: time)
for (skeletonJointIndex, jointPath) in zip(0..., jointPaths) {
if let animationJointIndex = animation.jointPaths.firstIndex(of: jointPath) {
joints[skeletonJointIndex].transform = animatedTransforms[animationJointIndex]
} else {
joints[skeletonJointIndex].transform = restTransforms[skeletonJointIndex]
}
}
}With these changes made, we can tell a node to play an animation and see our skinned animation system in action:
This is great, but playing a running animation isn’t very useful if the character can’t actually run around.
Finding Nodes
We will sometimes find it convenient to locate a node by name. We can do this by writing a recursive method on the Node class that searches the node’s child nodes and, if the requested name is not found, continues to search the descendant nodes until the named node is found, or return nil if no such descendant exists.
func childNode(named name: String, recursive: Bool = true) -> Node? {
if let child = childNodes.first(where: { $0.name == name } ) {
return child
} else if recursive {
for child in childNodes {
if let grandchild = child.childNode(named: name) {
return grandchild
}
}
}
return nil
}Root Motion
To move the character around, we need to update the model matrix of the model’s root node over time. This will cause the animation to be applied relative to the world transform of the root node. This motion can be driven by user input, which is commonly used to move characters in 3D games.
In the sample code, we use a little trigonometry to make the character run in a circle. Assuming we have a reference to the character’s root note (using the method from the previous section), we compose a translation matrix and rotation matrix together, to perform root motion over time:
// How wide of a circle to run in
let circuitRadius: Float = 3.0
// How long it takes to complete one circular lap
let circuitDuration: TimeInterval = 6.0
let runTime = fmod(time, circuitDuration)
let runAngle = Float((2.0 * .pi * runTime) / circuitDuration)
let position = SIMD3<Float>(circuitRadius * cos(runAngle), 0, circuitRadius * sin(runAngle)) let rotation = float4x4(rotateAbout: SIMD3<Float>(0, 1, 0), byAngle: runAngle)
let translation = float4x4(translate: position) character.transform = translation * rotation
With root motion applied, our character is now able to go for a jog around our little world:
In this lengthy article, we have explored how to load and render skinned, animated models with Model I/O and Metal. In the final two articles of this series we will delve into physically-based rendering and postprocessing to add extra degrees of realism to our virtual scenes.